Image Classification

Right click to download this notebook from GitHub.


Satellite images often need to be classified (assigned to a fixed set of types) or to be used for detection of various features of interest. Here we will look at the classification case, using labelled satellite images from various categories from the UCMerced LandUse dataset . scikit-learn is useful for general numeric data types, but it doesn't have significant support for working with images. Luckily, there are various deep-learning and convolutional-network libraries that do support images well, including Keras (backed by TensorFlow) as we will use here. To run this notebook, you will first need to download the dataset and put it in ../data/.

In [1]:
import os
import intake
import glob

import numpy as np
import geoviews as gv
import holoviews as hv
import pandas as pd

gv.extension('bokeh')

Get the classes and files

In [2]:
path = '../data/UCMerced_LandUse/Images/'
classes = np.array([f.split('/')[-1] for f in glob.glob(path+'*')])
files = {c: glob.glob(os.path.join(path, c, '*')) for c in classes}
In [3]:
classes
Out[3]:
array(['forest', 'buildings', 'river', 'mobilehomepark', 'harbor',
       'golfcourse', 'agricultural', 'runway', 'baseballdiamond',
       'overpass', 'chaparral', 'tenniscourt', 'intersection', 'airplane',
       'parkinglot', 'sparseresidential', 'mediumresidential',
       'denseresidential', 'beach', 'freeway', 'storagetanks'],
      dtype='<U17')

Split files into train and test sets

In [4]:
train_set = list(np.random.choice(np.arange(100), 80, False))
test_set = [i for i in range(100) if i not in train_set]

train_files = {c: [f for f in fs if int(f[-6:-4]) in train_set] for c, fs in files.items()}
test_files  = {c: [f for f in fs if int(f[-6:-4]) in test_set]  for c, fs in files.items()}

Define function to sample from train or test set

In [5]:
def get_sample(cls, set='training'):
    files = train_files if set == 'training' else test_files
    flist = list(files[cls])
    f = flist[np.random.randint(len(flist))]
    return gv.RGB.load_tiff(f).relabel(cls)

Samples are loaded as xarrays:

In [6]:
get_sample(classes[0]).data
Out[6]:
<xarray.Dataset>
Dimensions:  (x: 256, y: 256)
Coordinates:
  * x        (x) float64 0.5 1.5 2.5 3.5 4.5 ... 251.5 252.5 253.5 254.5 255.5
  * y        (y) float64 255.5 254.5 253.5 252.5 251.5 ... 4.5 3.5 2.5 1.5 0.5
Data variables:
    R        (y, x) uint8 109 112 123 131 132 119 ... 142 138 141 147 122 145
    G        (y, x) uint8 111 114 125 132 134 120 ... 141 136 139 145 119 142
    B        (y, x) uint8 119 122 132 139 141 126 ... 138 134 136 146 120 143
Attributes:
    transform:   (1.0, 0.0, 0.0, 0.0, 1.0, 0.0)
    res:         (1.0, -1.0)
    is_tiled:    0
    nodatavals:  (nan, nan, nan)

But are actually visualizable RGB Images:

In [7]:
gv.Layout([get_sample(s) for s in np.random.choice(classes, 4)]).cols(2)
Out[7]:

Define the model

A simple convolutional network using Keras:

In [8]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(100, 100, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(21))
model.add(Activation('sigmoid'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])
Using TensorFlow backend.

Declare the data

We will define a generator that loads chunks of the data:

In [9]:
ntraining = 10000

def get_array(rgb):
    h, w = rgb.interface.shape(rgb, True)
    b = np.random.randint(h-100)
    l = np.random.randint(w-100)
    return np.dstack([np.flipud(rgb.dimension_values(d, flat=False)[b:b+100, l:l+100])/255 for d in rgb.vdims])

choices = np.random.choice(classes, ntraining)
class_list = list(classes)

def gen_samples(choices, set='training'):
    "Generates random arrays along with class labels"
    for c in choices:
        labels = np.zeros((21,))
        labels[class_list.index(c)] = 1
        yield get_array(get_sample(c, set))[np.newaxis, :], labels[np.newaxis, :]        

Run the model

In [10]:
%%time
history = model.fit_generator(gen_samples(choices), steps_per_epoch=100, epochs=100, verbose=1)
Epoch 1/100
100/100 [==============================] - 3s 28ms/step - loss: 3.0790 - acc: 0.0700
Epoch 2/100
100/100 [==============================] - 2s 20ms/step - loss: 3.0574 - acc: 0.0700
Epoch 3/100
100/100 [==============================] - 2s 21ms/step - loss: 3.0501 - acc: 0.0200
Epoch 4/100
100/100 [==============================] - 2s 21ms/step - loss: 3.0847 - acc: 0.0800
Epoch 5/100
100/100 [==============================] - 2s 23ms/step - loss: 3.0367 - acc: 0.1000
Epoch 6/100
100/100 [==============================] - 2s 23ms/step - loss: 3.0429 - acc: 0.1200
Epoch 7/100
100/100 [==============================] - 2s 22ms/step - loss: 3.0499 - acc: 0.0400
Epoch 8/100
100/100 [==============================] - 2s 21ms/step - loss: 3.0776 - acc: 0.0600
Epoch 9/100
100/100 [==============================] - 2s 23ms/step - loss: 3.0016 - acc: 0.0300
Epoch 10/100
100/100 [==============================] - 2s 22ms/step - loss: 3.0458 - acc: 0.0500
Epoch 11/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9841 - acc: 0.1000TA: 
Epoch 12/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9835 - acc: 0.0900
Epoch 13/100
100/100 [==============================] - 2s 22ms/step - loss: 2.9834 - acc: 0.0400
Epoch 14/100
100/100 [==============================] - 2s 22ms/step - loss: 3.0607 - acc: 0.0700
Epoch 15/100
100/100 [==============================] - 2s 23ms/step - loss: 3.0870 - acc: 0.0200
Epoch 16/100
100/100 [==============================] - 2s 23ms/step - loss: 3.0284 - acc: 0.0400
Epoch 17/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9193 - acc: 0.0900
Epoch 18/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9902 - acc: 0.0500
Epoch 19/100
100/100 [==============================] - 2s 22ms/step - loss: 2.9970 - acc: 0.0700 ETA: 1s -
Epoch 20/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9082 - acc: 0.1100
Epoch 21/100
100/100 [==============================] - 2s 23ms/step - loss: 3.0722 - acc: 0.0800
Epoch 22/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8730 - acc: 0.0900
Epoch 23/100
100/100 [==============================] - 2s 21ms/step - loss: 2.9459 - acc: 0.1600
Epoch 24/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8887 - acc: 0.1000
Epoch 25/100
100/100 [==============================] - 2s 23ms/step - loss: 2.8127 - acc: 0.1200
Epoch 26/100
100/100 [==============================] - 2s 21ms/step - loss: 2.9728 - acc: 0.1000
Epoch 27/100
100/100 [==============================] - 2s 22ms/step - loss: 2.9410 - acc: 0.1000
Epoch 28/100
100/100 [==============================] - 2s 22ms/step - loss: 2.9618 - acc: 0.0600
Epoch 29/100
100/100 [==============================] - 2s 24ms/step - loss: 2.9207 - acc: 0.0400
Epoch 30/100
100/100 [==============================] - 2s 22ms/step - loss: 2.9149 - acc: 0.0500
Epoch 31/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8333 - acc: 0.0700
Epoch 32/100
100/100 [==============================] - 2s 21ms/step - loss: 2.8562 - acc: 0.1200
Epoch 33/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8667 - acc: 0.0700
Epoch 34/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8617 - acc: 0.1000
Epoch 35/100
100/100 [==============================] - 2s 20ms/step - loss: 2.9341 - acc: 0.0300
Epoch 36/100
100/100 [==============================] - 2s 20ms/step - loss: 2.8684 - acc: 0.1000
Epoch 37/100
100/100 [==============================] - 2s 21ms/step - loss: 2.8388 - acc: 0.0500
Epoch 38/100
100/100 [==============================] - 2s 22ms/step - loss: 2.9390 - acc: 0.0400
Epoch 39/100
100/100 [==============================] - 2s 21ms/step - loss: 2.8217 - acc: 0.0800
Epoch 40/100
100/100 [==============================] - 2s 21ms/step - loss: 2.8177 - acc: 0.1400
Epoch 41/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7590 - acc: 0.0600
Epoch 42/100
100/100 [==============================] - 2s 25ms/step - loss: 2.8883 - acc: 0.1300
Epoch 43/100
100/100 [==============================] - 2s 24ms/step - loss: 2.8581 - acc: 0.1300
Epoch 44/100
100/100 [==============================] - 3s 25ms/step - loss: 2.8733 - acc: 0.1000
Epoch 45/100
100/100 [==============================] - 2s 23ms/step - loss: 2.7163 - acc: 0.1900
Epoch 46/100
100/100 [==============================] - 2s 23ms/step - loss: 2.7677 - acc: 0.1000
Epoch 47/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8865 - acc: 0.1000
Epoch 48/100
100/100 [==============================] - 3s 26ms/step - loss: 2.8035 - acc: 0.0700
Epoch 49/100
100/100 [==============================] - 2s 21ms/step - loss: 2.6108 - acc: 0.1700
Epoch 50/100
100/100 [==============================] - 2s 23ms/step - loss: 2.8444 - acc: 0.0900
Epoch 51/100
100/100 [==============================] - 2s 23ms/step - loss: 2.7030 - acc: 0.0900
Epoch 52/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9050 - acc: 0.1200
Epoch 53/100
100/100 [==============================] - 2s 24ms/step - loss: 2.8248 - acc: 0.0800
Epoch 54/100
100/100 [==============================] - 2s 24ms/step - loss: 2.8356 - acc: 0.0800
Epoch 55/100
100/100 [==============================] - 2s 24ms/step - loss: 2.8857 - acc: 0.1300
Epoch 56/100
100/100 [==============================] - 2s 23ms/step - loss: 2.8375 - acc: 0.1200
Epoch 57/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8120 - acc: 0.1100
Epoch 58/100
100/100 [==============================] - 2s 24ms/step - loss: 2.7842 - acc: 0.1400
Epoch 59/100
100/100 [==============================] - 2s 23ms/step - loss: 2.7306 - acc: 0.1000
Epoch 60/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8305 - acc: 0.1400
Epoch 61/100
100/100 [==============================] - 2s 22ms/step - loss: 2.7665 - acc: 0.1300
Epoch 62/100
100/100 [==============================] - 2s 23ms/step - loss: 2.9398 - acc: 0.1000
Epoch 63/100
100/100 [==============================] - 2s 20ms/step - loss: 2.7723 - acc: 0.1100
Epoch 64/100
100/100 [==============================] - 2s 20ms/step - loss: 2.6766 - acc: 0.1200
Epoch 65/100
100/100 [==============================] - 2s 23ms/step - loss: 2.8159 - acc: 0.1300
Epoch 66/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7869 - acc: 0.0500
Epoch 67/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8221 - acc: 0.1400
Epoch 68/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8165 - acc: 0.1000
Epoch 69/100
100/100 [==============================] - 2s 21ms/step - loss: 2.8223 - acc: 0.0700
Epoch 70/100
100/100 [==============================] - 2s 21ms/step - loss: 2.8460 - acc: 0.0600
Epoch 71/100
100/100 [==============================] - 2s 21ms/step - loss: 2.6816 - acc: 0.1900
Epoch 72/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7338 - acc: 0.1500
Epoch 73/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7712 - acc: 0.1200
Epoch 74/100
100/100 [==============================] - 2s 21ms/step - loss: 2.6798 - acc: 0.1800
Epoch 75/100
100/100 [==============================] - 2s 21ms/step - loss: 2.9016 - acc: 0.1100
Epoch 76/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7174 - acc: 0.0900
Epoch 77/100
100/100 [==============================] - 2s 21ms/step - loss: 2.6493 - acc: 0.1400
Epoch 78/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7494 - acc: 0.1800
Epoch 79/100
100/100 [==============================] - 2s 24ms/step - loss: 2.7806 - acc: 0.0900: 1s - loss: 2.
Epoch 80/100
100/100 [==============================] - 2s 22ms/step - loss: 2.7023 - acc: 0.1100
Epoch 81/100
100/100 [==============================] - 2s 25ms/step - loss: 2.8758 - acc: 0.0900
Epoch 82/100
100/100 [==============================] - 2s 24ms/step - loss: 2.6501 - acc: 0.1300
Epoch 83/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8998 - acc: 0.0800
Epoch 84/100
100/100 [==============================] - 2s 23ms/step - loss: 2.7271 - acc: 0.1700
Epoch 85/100
100/100 [==============================] - 2s 23ms/step - loss: 2.6815 - acc: 0.1400
Epoch 86/100
100/100 [==============================] - 2s 24ms/step - loss: 2.7318 - acc: 0.1400
Epoch 87/100
100/100 [==============================] - 2s 24ms/step - loss: 2.7537 - acc: 0.1300
Epoch 88/100
100/100 [==============================] - 2s 21ms/step - loss: 2.9017 - acc: 0.1000
Epoch 89/100
100/100 [==============================] - 2s 21ms/step - loss: 2.7673 - acc: 0.1300
Epoch 90/100
100/100 [==============================] - 2s 23ms/step - loss: 2.7903 - acc: 0.1000
Epoch 91/100
100/100 [==============================] - 2s 24ms/step - loss: 2.8658 - acc: 0.1600
Epoch 92/100
100/100 [==============================] - 2s 22ms/step - loss: 2.7843 - acc: 0.1500
Epoch 93/100
100/100 [==============================] - 2s 23ms/step - loss: 2.8030 - acc: 0.1500
Epoch 94/100
100/100 [==============================] - 2s 21ms/step - loss: 2.6355 - acc: 0.1300
Epoch 95/100
100/100 [==============================] - 2s 22ms/step - loss: 2.8706 - acc: 0.0700
Epoch 96/100
100/100 [==============================] - 3s 25ms/step - loss: 2.8197 - acc: 0.1000
Epoch 97/100
100/100 [==============================] - 2s 22ms/step - loss: 2.6638 - acc: 0.1700
Epoch 98/100
100/100 [==============================] - 2s 23ms/step - loss: 2.6979 - acc: 0.1600
Epoch 99/100
100/100 [==============================] - 2s 22ms/step - loss: 2.7407 - acc: 0.1600
Epoch 100/100
100/100 [==============================] - 2s 22ms/step - loss: 2.6611 - acc: 0.1200
CPU times: user 6min 39s, sys: 55.3 s, total: 7min 35s
Wall time: 3min 44s

Evaluate the model

In [11]:
(hv.Curve(history.history['loss'], 'Iteration', 'Loss'    ).options(width=400) +
 hv.Curve(history.history['acc'],  'Iteration', 'Accuracy').options(width=400))
Out[11]:

Now let us test the predictions on the test set, first visually:

In [12]:
def get_prediction(cls):
    sample = get_sample(cls, 'test')
    array = get_array(sample)[np.newaxis, ...]
    p = model.predict(array).argmax()
    p = classes[p]
    return sample.relabel('Predicted: %s - Actual: %s' % (p, cls))

opts = dict(fontsize={'title': '8pt'}, xaxis=None, yaxis=None, width=250, height=250)
hv.Layout([get_prediction(cls).options(**opts) for cls in classes[:20]]).cols(3)
Out[12]:

And now numerically for 500 predictions:

In [13]:
ntesting = 500
choices = np.random.choice(classes, ntesting)
class_list = list(classes)

prediction = model.predict_generator(gen_samples(choices), steps=ntesting)
predictions = classes[prediction.argmax(axis=1)]

accuracy = (predictions==choices).sum()/ntesting

print(f'Accuracy on test set {accuracy}')
Accuracy on test set 0.18

Next we can see how well the classifier performs on the different categories. We'll run 20 predictions on each category:

In [14]:
def predict(cls, iterations=20):
    accurate, predictions = [], []
    for i in range(iterations):
        sample = get_sample(cls, 'test')
        array = get_array(sample)[np.newaxis, ...]
        p = model.predict(array).argmax()
        p = classes[p]
        predictions.append(p)
        accurate.append(p == cls)
    return np.sum(accurate)/float(iterations), predictions

accuracies = [(c, *predict(c)) for c in classes]

We can now visualize this data as a bar chart:

In [15]:
df = pd.DataFrame(accuracies, columns=['landuse', 'accuracy', 'predictions'])

hv.Bars(df, 'landuse', 'accuracy').options(width=700, xrotation=45, color_index='landuse', 
                                           cmap='Category20', show_legend=False)
Out[15]:

Another interesting way of viewing this data is to look at which categories the classifier got confused on. We will count how many times the classifier classified one category as another category and visualize the result as a Chord graph where each edge is colored by the predicted category. By clicking on a node we can reveal which other categories incorrectly identified an image as being of that category:

In [16]:
pdf = pd.DataFrame([(p, l) for (_, l, _, ps) in df.itertuples() for p in ps], columns=['Prediction', 'Actual'])
graph = pdf.groupby(['Prediction', 'Actual']).size().to_frame().reset_index()

hv.Chord(graph.rename(columns={0: 'Count'})).relabel('Misclassification Graph').options(
    node_color='index', cmap='Category20', edge_color_index='Actual', label_index='index',
    width=600, height=600)
Out[16]:

Clicking on buildings, for instance, reveals a lot of confusion about overpasses, mediumresidential, and intersections, all of which do share visual features in common. Conversely, number of buildings were misidentified as parklots, which is also reasonable. As we saw in the bar chart above, forests on the other hand, have lots of edges leading back to itself, demonstrating the high accuracy observed for that category of images.